41 research outputs found
Advances in detecting object classes and their semantic parts
Object classes are central to computer vision and have been the focus of substantial
research in the last fifteen years. This thesis addresses the tasks of localizing entire
objects in images (object class detection) and localizing their semantic parts (part detection).
We present four contributions, two for each task. The first two improve
existing object class detection techniques by using context and calibration. The other
two contributions explore semantic part detection in weakly-supervised settings.
First, the thesis presents a technique for predicting properties of objects in an image
based on its global appearance only. We demonstrate the method by predicting three
properties: aspect of appearance, location in the image and class membership. Overall,
the technique makes multi-component object detectors faster and improves their
performance.
The second contribution is a method for calibrating the popular Ensemble of Exemplar-
SVM object detector. Unlike the standard approach, which calibrates each Exemplar-
SVM independently, our technique optimizes their joint performance as an ensemble.
We devise an efficient optimization algorithm to find the global optimal solution of the
calibration problem. This leads to better object detection performance compared to
using independent calibration.
The third innovation is a technique to train part-based model of object classes using
data sourced from the web. We learn rich models incrementally. Our models encompass
the appearance of parts and their spatial arrangement on the object, specific to
each viewpoint. Importantly, it does not require any part location annotation, which is
one of the main limits to training many part detectors.
Finally, the last contribution is a study on whether semantic object parts emerge in
Convolutional Neural Networks trained for higher-level tasks, such as image classification.
While previous efforts studied this matter by visual inspection only, we perform
an extensive quantitative analysis based on ground-truth part location annotations. This
provides a more conclusive answer to the question
Influenza di antifiamma alogenati e di argille nanostrutturate sulla stabilità termica e reazione al fuoco di sistemi polimerici a base polistirene
openEmabargo per motivi di segretezza e/o di proprietà dei risultati e informazioni di enti esterni o aziende private che hanno partecipato alla realizzazione del lavoro di ricerca relativo alla tes
Do semantic parts emerge in Convolutional Neural Networks?
Semantic object parts can be useful for several visual recognition tasks.
Lately, these tasks have been addressed using Convolutional Neural Networks
(CNN), achieving outstanding results. In this work we study whether CNNs learn
semantic parts in their internal representation. We investigate the responses
of convolutional filters and try to associate their stimuli with semantic
parts. We perform two extensive quantitative analyses. First, we use
ground-truth part bounding-boxes from the PASCAL-Part dataset to determine how
many of those semantic parts emerge in the CNN. We explore this emergence for
different layers, network depths, and supervision levels. Second, we collect
human judgements in order to study what fraction of all filters systematically
fire on any semantic part, even if not annotated in PASCAL-Part. Moreover, we
explore several connections between discriminative power and semantics. We find
out which are the most discriminative filters for object recognition, and
analyze whether they respond to semantic parts or to other image patches. We
also investigate the other direction: we determine which semantic parts are the
most discriminative and whether they correspond to those parts emerging in the
network. This enables to gain an even deeper understanding of the role of
semantic parts in the network
Objects as Context for Detecting Their Semantic Parts
We present a semantic part detection approach that effectively leverages
object information.We use the object appearance and its class as indicators of
what parts to expect. We also model the expected relative location of parts
inside the objects based on their appearance. We achieve this with a new
network module, called OffsetNet, that efficiently predicts a variable number
of part locations within a given object. Our model incorporates all these cues
to detect parts in the context of their objects. This leads to considerably
higher performance for the challenging task of part detection compared to using
part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to
other part detection methods on both PASCAL-Part and CUB200-2011 datasets